Delft University of Technology A Coflow-based Co-optimization Framework for High-performance Data Analytics

نویسندگان

  • Long Cheng
  • Ying Wang
  • Yulong Pei
  • Dick Epema
چکیده

Efficient execution of distributed database operators such as joining and aggregating is critical for the performance of big data analytics. With the increase of the compute speedup of modern CPUs, reducing the network communication time of these operators in large systems is becoming increasingly important, and also challenging current techniques. Significant performance improvements have been achieved by using state-of-the-art methods, such as reducing network traffic designed in the data management domain, and data flow scheduling in the data communications domain. However, the proposed techniques in both fields just view each other as a black box, and performance gains from a co-optimization perspective have not yet been explored. In this paper, based on current research in coflow scheduling, we propose a novel Coflow-based Co-optimization Framework (CCF), which can co-optimize application-level data movement and network-level data communications for distributed operators, and consequently contribute to their performance in large distributed environments. We present the detailed design and implementation of CCF, and conduct an experimental evaluation of CCF using large-scale simulations on large data joins. Our results demonstrate that CCF can always perform faster than current approaches on network communications in large-scale distributed scenarios. Keywords-big data; coflow scheduling; distributed joins; network communications; data-intensive applications

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection

Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....

متن کامل

Big Data Analytics and Now-casting: A Comprehensive Model for Eventuality of Forecasting and Predictive Policies of Policy-making Institutions

The ability of now-casting and eventuality is the most crucial and vital achievement of big data analytics in the area of policy-making. To recognize the trends and to render a real image of the current condition and alarming immediate indicators, the significance and the specific positions of big data in policy-making are undeniable. Moreover, the requirement for policy-making institutions to ...

متن کامل

Satellite Conceptual Design Multi-Objective Optimization Using Co Framework

This paper focuses upon the development of an efficient method for conceptual design optimization of a satellite. There are many option for a satellite subsystems that could be choice, as acceptable solution to implement of a space system mission. Every option should be assessment based on the different criteria such as cost, mass, reliability and technology contraint (complexity). In this rese...

متن کامل

Ontology-Based Data Integration from Heterogeneous Urban Systems: A Knowledge Representation Framework for Smart Cities

This paper presents a novel knowledge representation framework for smart city planning and management that enables the semantic integration of heterogeneous urban data from diverse sources. Currently, the combination of information across city agencies is cumbersome, as the increasingly available datasets are stored in disparate data silos, using different models and schemas for their descripti...

متن کامل

A General Framework for 1-D Histogram-baesd Image Contrast Enhancement

In this paper, a general framework for image contrast enhancement algorithm based on an optimization problem is presented. Through this optimization, the intensities can be better distributed. The algorithm is based on the facts that the histogram of the enhanced image is close to the input image histogram and uniform distribution, simultaneously. Based on this fact, we obtain a closed form opt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017